Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
1.
Front Plant Sci ; 15: 1336726, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38708388

RESUMO

In the post-genomic era, virus-induced gene silencing (VIGS) has played an important role in research on reverse genetics in plants. Commonly used Agrobacterium-mediated VIGS inoculation methods include stem scratching, leaf infiltration, use of agrodrench, and air-brush spraying. In this study, we developed a root wounding-immersion method in which 1/3 of the plant root (length) was cut and immersed in a tobacco rattle virus (TRV)1:TRV2 mixed solution for 30 min. We optimized the procedure in Nicotiana benthamiana and successfully silenced N. benthamiana, tomato (Solanum lycopersicum), pepper (Capsicum annuum L.), eggplant (Solanum melongena), and Arabidopsis thaliana phytoene desaturase (PDS), and we observed the movement of green fluorescent protein (GFP) from the roots to the stem and leaves. The silencing rate of PDS in N. benthamiana and tomato was 95-100%. In addition, we successfully silenced two disease-resistance genes, SITL5 and SITL6, to decrease disease resistance in tomatoes (CLN2037E). The root wounding-immersion method can be used to inoculate large batches of plants in a short time and with high efficiency, and fresh bacterial infusions can be reused several times. The most important aspect of the root wounding-immersion method is its application to plant species susceptible to root inoculation, as well as its ability to inoculate seedlings from early growth stages. This method offers a means to conduct large-scale functional genome screening in plants.

2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678389

RESUMO

MOTIVATION: Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. RESULTS: To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Algoritmos , Biologia Computacional/métodos , Software
3.
Plants (Basel) ; 13(8)2024 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-38674502

RESUMO

Trichomes are specialized organs located in the plant epidermis that play important defense roles against biotic and abiotic stresses. However, the mechanisms regulating the development of pepper epidermal trichomes and the related regulatory genes at the molecular level are not clear. Therefore, we performed transcriptome analyses of A114 (less trichome) and A115 (more trichome) to dig deeper into the genes involved in the regulatory mechanisms of epidermal trichome development in peppers. In this study, the epidermal trichome density of A115 was found to be higher by phenotypic observation and was highest in the leaves at the flowering stage. A total of 39,261 genes were quantified by RNA-Seq, including 11,939 genes not annotated in the previous genome analysis and 18,833 differentially expressed genes. Based on KEGG functional enrichment, it was found that DEGs were mainly concentrated in three pathways: plant-pathogen interaction, MAPK signaling pathway-plant, and plant hormone signal transduction. We further screened the DEGs associated with the development of epidermal trichomes in peppers, and the expression of the plant signaling genes GID1B-like (Capana03g003488) and PR-6 (Capana09g001847), the transcription factors MYB108 (Capana05g002225) and ABR1-like (Capana04g001261), and the plant resistance genes PGIP-like (Capana09g002077) and At5g49770 (Capana08g001721) in the DEGs were higher at A115 compared to A114, and were highly expressed in leaves at the flowering stage. In addition, based on the WGCNA results and the establishment of co-expression networks showed that the above genes were highly positively correlated with each other. The transcriptomic data and analysis of this study provide a basis for the study of the regulatory mechanisms of pepper epidermal trichomes.

4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38366803

RESUMO

The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.


Assuntos
Algoritmos , Análise da Expressão Gênica de Célula Única , Benchmarking , Entropia , Biblioteca Gênica , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Análise por Conglomerados
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388681

RESUMO

MOTIVATION: Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS: We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY: An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT: dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Journal Name online.


Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Idioma , Análise de Sequência de RNA/métodos
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38279647

RESUMO

MOTIVATION: The rapid development of spatial transcriptome technologies has enabled researchers to acquire single-cell-level spatial data at an affordable price. However, computational analysis tools, such as annotation tools, tailored for these data are still lacking. Recently, many computational frameworks have emerged to integrate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics datasets. While some frameworks can utilize well-annotated scRNA-seq data to annotate spatial expression patterns, they overlook critical aspects. First, existing tools do not explicitly consider cell type mapping when aligning the two modalities. Second, current frameworks lack the capability to detect novel cells, which remains a key interest for biologists. RESULTS: To address these problems, we propose an annotation method for spatial transcriptome data called SPANN. The main tasks of SPANN are to transfer cell-type labels from well-annotated scRNA-seq data to newly generated single-cell resolution spatial transcriptome data and discover novel cells from spatial data. The major innovations of SPANN come from two aspects: SPANN automatically detects novel cells from unseen cell types while maintaining high annotation accuracy over known cell types. SPANN finds a mapping between spatial transcriptome samples and RNA data prototypes and thus conducts cell-type-level alignment. Comprehensive experiments using datasets from various spatial platforms demonstrate SPANN's capabilities in annotating known cell types and discovering novel cell states within complex tissue contexts. AVAILABILITY: The source code of SPANN can be accessed at https://github.com/ddb-qiwang/SPANN-torch. CONTACT: dengmh@math.pku.edu.cn.


Assuntos
Análise da Expressão Gênica de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Software
7.
Adv Healthc Mater ; 13(3): e2302117, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37922499

RESUMO

Prostate-specific antigen (PSA) is the common serum-relevant biomarker for early prostate cancer (PCa) detection in clinical diagnosis. However, it is difficult to accurately diagnose PCa in the early stage due to the low specificity of PSA. Herein, a new solution-gated graphene field transistor (SGGT) biosensor with dual-gate for dual-biomarker detection is designed. The sensing mechanism is that the designed aptamers immobilized on the surface of the gate electrodes can capture PSA and sarcosine (SAR) biomolecules and induce the capacitance changes of the electric double layers of SGGT. The limit of detections of PSA and SAR biomarkers can reach 0.01 fg mL-1 , which is three-to-four orders of magnitude lower than previously reported assays. The detection time of PSA and SAR is ≈4.5 and ≈13 min, which is significantly faster than the detection time (1-2 h) of conventional methods. The clinical serum samples testing demonstrates that the biosensor can distinguish the PCa patients from the control group and the diagnosis accuracy can reach 100%. The SGGT biosensor can be integrated into the portable platform and the diagnostic results can directly display on the smartphone/Pad. Therefore, the integrated portable platform of the biosensor can distinguish cancer types through the dual-biomarker detection.


Assuntos
Técnicas Biossensoriais , Grafite , Neoplasias da Próstata , Masculino , Humanos , Antígeno Prostático Específico , Neoplasias da Próstata/diagnóstico , Eletrodos , Técnicas Biossensoriais/métodos
8.
Int J Mol Sci ; 24(23)2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-38068885

RESUMO

Carotenoids are important pigments in pepper fruits. The colors of each pepper are mainly determined by the composition and content of carotenoid. The 'ZY' variety, which has yellow fruit, is a natural mutant derived from a branch mutant of 'ZR' with different colors. ZY and ZR exhibit obvious differences in fruit color, but no other obvious differences in other traits. To investigate the main reasons for the formation of different colored pepper fruits, transcriptome and metabolome analyses were performed in three developmental stages (S1-S3) in two cultivars. The results revealed that these structural genes (PSY1, CRTISO, CCD1, CYP97C1, VDE1, CCS, NCED1 and NCED2) related to carotenoid biosynthesis were expressed differentially in the two cultivars. Capsanthin and capsorubin mainly accumulated in ZR and were almost non-existent in ZY. S2 is the fruit color-changing stage; this may be a critical period for the development of different color formation of ZY and ZR. A combination of transcriptome and metabolome analyses indicated that CCS, NCED2, AAO4, VDE1 and CYP97C1 genes were key to the differences in the total carotenoid content. These new insights into pepper fruit coloration may help to improve fruit breeding strategies.


Assuntos
Carotenoides , Melhoramento Vegetal , Carotenoides/metabolismo , Perfilação da Expressão Gênica , Frutas/metabolismo , Transcriptoma , Metaboloma , Regulação da Expressão Gênica de Plantas
9.
Anal Chem ; 95(48): 17750-17758, 2023 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971943

RESUMO

A new type of carbon dot (CD)-functionalized solution-gated graphene transistor (SGGT) sensor was designed and fabricated for the highly sensitive and highly selective detection of glutathione (GSH). The CDs were synthesized via a one-step hydrothermal method using DL-thioctic acid and triethylenetetramine (TETA) as sources of S, N, and C. The CDs have abundant amino and carboxyl groups and were used to modify the surface of the gate electrode of SGGT as probes for detecting GSH. Remarkably, the CDs-SGGT sensor exhibited excellent selectivity and ultrahigh sensitivity to GSH, with an ultralow limit of detection (LOD) of up to 10-19 M. To the best of our knowledge, the sensor outperforms previously reported systems. Moreover, the CDs-SGGT sensor shows rapid detection and good stability. More importantly, the detection of GSH in artificial serum samples was successfully demonstrated.


Assuntos
Grafite , Pontos Quânticos , Carbono , Limite de Detecção , Glutationa
10.
Genome Res ; 33(10): 1788-1805, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37827697

RESUMO

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performances is yet to be conducted. To fill this gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and five ligand/receptor-target inference methods using a total of 116 data sets, including 15 ST data sets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data, and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, whereas stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulation using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.


Assuntos
Análise de Célula Única , Software , Ligantes , Análise de Célula Única/métodos , Algoritmos , Comunicação Celular/genética , Análise de Sequência de RNA/métodos
11.
Genes (Basel) ; 14(9)2023 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-37761877

RESUMO

Plant homeodomain (PHD) transcription factor genes are involved in plant development and in a plant's response to stress. However, there are few reports about this gene family in peppers (Capsicum annuum L.). In this study, the pepper inbred line "Zunla-1" was used as the reference genome, and a total of 43 PHD genes were identified, and systematic analysis was performed to study the chromosomal location, evolutionary relationship, gene structure, domains, and upstream cis-regulatory elements of the CaPHD genes. The fewest CaPHD genes were located on chromosome 4, while the most were on chromosome 3. Genes with similar gene structures and domains were clustered together. Expression analysis showed that the expression of CaPHD genes was quite different in different tissues and in response to various stress treatments. The expression of CaPHD17 was different in the early stage of flower bud development in the near-isogenic cytoplasmic male-sterile inbred and the maintainer inbred lines. It is speculated that this gene is involved in the development of male sterility in pepper. CaPHD37 was significantly upregulated in leaves and roots after heat stress, and it is speculated that CaPHD37 plays an important role in tolerating heat stress in pepper; in addition, CaPHD9, CaPHD10, CaPHD11, CaPHD17, CaPHD19, CaPHD20, and CaPHD43 were not sensitive to abiotic stress or hormonal factors. This study will provide the basis for further research into the function of CaPHD genes in plant development and responses to abiotic stresses and hormones.


Assuntos
Alimentos , Piper nigrum , Humanos , Genes Homeobox , Estresse Fisiológico/genética , Fatores de Transcrição/genética , Flores/genética
12.
Genes (Basel) ; 14(9)2023 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-37761928

RESUMO

An in-house tomato inbred line, YNAU335, was planted in a greenhouse in spring from 2014 to 2017, and showed immunity to tomato spotted wilt virus (TSWV). YNAU335 was infected with TSWV in the spring from 2018 to 2020, and disease was observed on the leaves, sepals, and fruits. In 2021 and 2022, YNAU335 was planted in spring in the same greenhouse, which was suspected of being infected with TSWV, and visible disease symptoms were observed on the fruits. Transmission electron microscopy, deep sequencing of small RNAs, and molecular mutation diagnosis were used to analyze the pathological features and genetic polymorphism of TSWV infecting tomato fruit. Typical TSWV virions were observed in the infected fruits, but not leaves from YNAU335 grown between 2021 and 2022, and cross-infection was very rarely observed. The number of mitochondria and chloroplasts increased, but the damage to the mitochondria was greater than that seen in the chloroplasts. Small RNA deep sequencing revealed the presence of multiple viral species in TSWV-infected and non-infected tomato samples grown between 2014-2022. Many virus species, including TSWV, which accounted for the largest proportion, were detected in the TSWV-infected tomato leaves and fruit. However, a variety of viruses other than TSWV were also detected in the non-infected tissues. The amino acids of TSWV nucleocapsid proteins (NPs) and movement proteins (MPs) from diseased fruits of YNAU335 picked in 2021-2022 were found to be very diverse. Compared with previously identified NPs and MPs from TSWV isolates, those found in this study could be divided into three types: non-resistance-breaking, resistance-breaking, and other isolates. The number of positive clones and a comparison with previously identified amino acid mutations suggested that mutation F at AA118 of the MP (GenBank OL310707) is likely the key to breaking the resistance to TSWV, and this mutation developed only in the infected fruit of YNAU335 grown in 2021 and 2022.

13.
Genes (Basel) ; 14(8)2023 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-37628673

RESUMO

Although thaumatin-like proteins (TLPs) are involved in resistance to a variety of fungal diseases, whether the TLP5 and TLP6 genes in tomato plants (Solanum lycopersicum) confer resistance to the pathogenesis of soil-borne diseases has not been demonstrated. In this study, five soil-borne diseases (fungal pathogens: Fusarium solani, Fusarium oxysporum, and Verticillium dahliae; bacterial pathogens: Clavibacter michiganense subsp. michiganense and Ralstonia solanacearum) were used to infect susceptible "No. 5" and disease-resistant "S-55" tomato cultivars. We found that SlTLP5 and SlTLP6 transcript levels were higher in susceptible cultivars treated with the three fungal pathogens than in those treated with the two bacterial pathogens and that transcript levels varied depending on the pathogen. Moreover, the SlTLP5 and SlTLP6 transcript levels were much higher in disease-resistant cultivars than in disease-susceptible cultivars, and the SlTLP5 and SlTLP6 transcript levels were higher in cultivars treated with the same fungal pathogen than in those treated with bacterial pathogens. SlTLP6 transcript levels were higher than SlTLP5. SlTLP5 and SlTLP6 overexpression and gene-edited transgenic mutants were generated in both susceptible and resistant cultivars. Overexpression and knockout increased and decreased resistance to the five diseases, respectively. Transgenic plants overexpressing SlTLP5 and SlTLP6 inhibited the activities of peroxidase (POD), superoxide dismutase (SOD), ascorbate peroxidase (APX), and catalase (CAT) after inoculation with fungal pathogens, and the activities of POD, SOD, and APX were similar to those of fungi after infection with bacterial pathogens. The activities of CAT were increased, and the activity of ß-1,3-glucanase was increased in both the fungal and bacterial treatments. Overexpressed plants were more resistant than the control plants. After SlTLP5 and SlTLP6 knockout plants were inoculated, POD, SOD, and APX had no significant changes, but CAT activity increased and decreased significantly after the fungal and bacterial treatments, contrary to overexpression. The activity of ß-1,3-glucanase decreased in the treatment of the five pathogens, and the knocked-out plants were more susceptible to disease than the control. In summary, this study contributes to the further understanding of TLP disease resistance mechanisms in tomato plants.


Assuntos
Solanum lycopersicum , Solanum lycopersicum/genética , Peroxidase , Superóxido Dismutase , Peroxidases , Ascorbato Peroxidases
14.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37369035

RESUMO

MOTIVATION: In recent years, high-throughput sequencing technologies have made large-scale protein sequences accessible. However, their functional annotations usually rely on low-throughput and pricey experimental studies. Computational prediction models offer a promising alternative to accelerate this process. Graph neural networks have shown significant progress in protein research, but capturing long-distance structural correlations and identifying key residues in protein graphs remains challenging. RESULTS: In the present study, we propose a novel deep learning model named Hierarchical graph transformEr with contrAstive Learning (HEAL) for protein function prediction. The core feature of HEAL is its ability to capture structural semantics using a hierarchical graph Transformer, which introduces a range of super-nodes mimicking functional motifs to interact with nodes in the protein graph. These semantic-aware super-node embeddings are then aggregated with varying emphasis to produce a graph representation. To optimize the network, we utilized graph contrastive learning as a regularization technique to maximize the similarity between different views of the graph representation. Evaluation of the PDBch test set shows that HEAL-PDB, trained on fewer data, achieves comparable performance to the recent state-of-the-art methods, such as DeepFRI. Moreover, HEAL, with the added benefit of unresolved protein structures predicted by AlphaFold2, outperforms DeepFRI by a significant margin on Fmax, AUPR, and Smin metrics on PDBch test set. Additionally, when there are no experimentally resolved structures available for the proteins of interest, HEAL can still achieve better performance on AFch test set than DeepFRI and DeepGOPlus by taking advantage of AlphaFold2 predicted structures. Finally, HEAL is capable of finding functional sites through class activation mapping. AVAILABILITY AND IMPLEMENTATION: Implementations of our HEAL can be found at https://github.com/ZhonghuiGu/HEAL.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Sequência de Aminoácidos , Redes Neurais de Computação , Semântica
15.
Adv Healthc Mater ; 12(25): e2300563, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37377126

RESUMO

The persistent infection of high-risk-human papillomavirus type 16 (HPV16) is considered an essential element for suffering cervical cancer. Despite polymerase chain reaction, loop-mediated amplification, and microfluidic chips are used to detect the HPV16, these methods still exist some drawbacks including time-consuming and false positive results. The CRISPR-Cas system is widely used in the region of biological detection due to its precise targeted recognition capability. In this contribution, the novel solution-gated graphene transistor sensor is designed to realize the unamplified and label-free detection of HPV16 DNA. Using the precise recognition of the CRISPR-Cas12a system and the gate functionalization, HPV16 DNA can be precisely identified without need the amplification and labeling. The limit of detection of the sensor can be up to 8.3 × 10-18  m and the detection can be within 20 min. Additionally, the heat-Inactivated clinical samples can be clearly distinguished by the sensor the diagnosis results have a high degree of agreement with q-PCR detection.


Assuntos
Sistemas CRISPR-Cas , Grafite , Humanos , Papillomavirus Humano 16/genética , DNA/genética , Técnicas de Amplificação de Ácido Nucleico
16.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36869836

RESUMO

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Simulação por Computador , Análise por Conglomerados , Análise de Sequência de RNA/métodos
17.
Elife ; 122023 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-36799896

RESUMO

Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.


Assuntos
Proteínas , Proteínas/metabolismo , Sítio Alostérico , Regulação Alostérica/genética , Domínio Catalítico
18.
Protein Sci ; 32(2): e4555, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36564866

RESUMO

The development of efficient computational methods for drug target protein identification can compensate for the high cost of experiments and is therefore of great significance for drug development. However, existing structure-based drug target protein-identification algorithms are limited by the insufficient number of proteins with experimentally resolved structures. Moreover, sequence-based algorithms cannot effectively extract information from protein sequences and thus display insufficient accuracy. Here, we combined the sequence-based self-supervised pretraining protein language model ESM1b with a graph convolutional neural network classifier to develop an improved, sequence-based drug target protein identification method. This complete model, named QuoteTarget, efficiently encodes proteins based on sequence information alone and achieves an accuracy of 95% with the nonredundant drug target and nondrug target datasets constructed for this study. When applied to all proteins from Homo sapiens, QuoteTarget identified 1213 potential undeveloped drug target proteins. We further inferred residue-binding weights from the well-trained network using the gradient-weighted class activation mapping (Grad-Cam) algorithm. Notably, we found that without any binding site information input, significant residues inferred by the model closely match the experimentally confirmed drug molecule-binding sites. Thus, our work provides a highly effective sequence-based identifier for drug target proteins, as well to yield new insights into recognizing drug molecule-binding sites. The entire model is available at https://github.com/Chenjxjx/drug-target-prediction.


Assuntos
Redes Neurais de Computação , Proteínas , Humanos , Proteínas/química , Algoritmos , Sítios de Ligação , Sequência de Aminoácidos
19.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36383167

RESUMO

MOTIVATION: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Confiabilidade dos Dados , Multiômica , Análise por Conglomerados
20.
Adv Sci (Weinh) ; 10(4): e2205886, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36480308

RESUMO

The incidence of prostate cancer (PCa) in men globally increases as the standard of living improves. Blood serum biomarker prostate-specific antigen (PSA) detection is the gold standard assay that do not meet the requirements of early detection. Herein, a solution-gated graphene transistor (SGGT) biosensor for the ultrasensitive and rapid quantification detection of the early prostate cancer-relevant biomarker, miRNA-21 is reported. The designed single-stranded DNA (ssDNA) probes immobilized on the Au gate can hybridize effectively with the miRNA-21 molecules targets and induce the Dirac voltage shifts of SGGT transfer curves. The limit of detection (LOD) of the sensor can reach 10-20  M without amplification and any chemical or biological labeling. The detection linear range is from 10-20 to 10-12  M. The sensor can realize real-time detection of the miRNA-21 molecules in less than 5 min and can well distinguish one-mismatched miRNA-21 molecule. The blood serum samples from the patients without RNA extraction and amplification are measured. The results demonstrated that the biosensor can well distinguish the cancer patients from the control group and has higher sensitivity (100%) than PSA detection (58.3%). Contrastingly, it can be found that the PSA level is not directly related to PCa.


Assuntos
Grafite , MicroRNAs , Neoplasias da Próstata , Masculino , Humanos , Antígeno Prostático Específico/genética , Grafite/química , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Biomarcadores Tumorais/genética , DNA de Cadeia Simples , MicroRNAs/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA